CLEVER: clique-enumerating variant finder

نویسندگان

Tobias Marschall

Ivan G. Costa

Stefan Canzar

Markus Bauer

Gunnar W. Klau

Alexander Schliep

Alexander Schönhuth

چکیده

MOTIVATION Next-generation sequencing techniques have facilitated a large-scale analysis of human genetic variation. Despite the advances in sequencing speed, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals. RESULTS Here, we present a novel internal segment size based approach, which organizes all, including concordant, reads into a read alignment graph, where max-cliques represent maximal contradiction-free groups of alignments. A novel algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions. For the first time in the literature, we compare a large range of state-of-the-art approaches using simulated Illumina reads from a fully annotated genome and present relevant performance statistics. We achieve superior performance, in particular, for deletions or insertions (indels) of length 20-100 nt. This has been previously identified as a remaining major challenge in structural variation discovery, in particular, for insert size based approaches. In this size range, we even outperform split-read aligners. We achieve competitive results also on biological data, where our method is the only one to make a substantial amount of correct predictions, which, additionally, are disjoint from those by split-read aligners. AVAILABILITY CLEVER is open source (GPL) and available from http://clever-sv.googlecode.com. CONTACT [email protected] or [email protected]. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels

MOTIVATION Accurately predicting and genotyping indels longer than 30 bp has remained a central challenge in next-generation sequencing (NGS) studies. While indels of up to 30 bp are reliably processed by standard read aligners and the Genome Analysis Toolkit (GATK), longer indels have still resisted proper treatment. Also, discovering and genotyping longer indels has become particularly releva...

متن کامل

Large Maximal Cliques Enumeration in Large Sparse Graphs

Identifying communities in social networks is a problem of great interest. One popular type of community is where every member of the community knows all others, which can be viewed as a clique in the graph representing the social network. In several real life situations, finding small cliques may not be interesting as they are large in number and low in information content. Hence, in this pape...

متن کامل

A Fast Parallel Maximum Clique Algorithm for Large Sparse Graphs and Temporal Strong Components

We propose a fast, parallel, maximum clique algorithm for large, sparse graphs that is designed to exploit characteristics of social and information networks. We observe roughly linear runtime scaling over graphs between 1000 vertices and 100M vertices. In a test with a 1.8 billion-edge social network, the algorithm finds the largest clique in about 20 minutes. For social networks, in particula...

متن کامل

What if CLIQUE were fast? Maximum Cliques in Information Networks and Strong Components in Temporal Networks

Exact maximum clique finders have progressed to the point where we can investigate cliques in million-node social and information networks, as well as find strongly connected components in temporal networks. We use one such finder to study a large collection of modern networks emanating from biological, social, and technological domains. We show inter-relationships between maximum cliques and s...

متن کامل

An Efficient Algorithm for Enumerating Pseudo Cliques

The problem of finding dense structures in a given graph is quite basic in informatics including data mining and data engineering. Clique is a popular model to represent dense structures, and widely used because of its simplicity and ease in handling. Pseudo cliques are natural extension of cliques which are subgraphs obtained by removing small number of edges from cliques. We here define a pse...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Bioinformatics

دوره 28 22 شماره

صفحات -

تاریخ انتشار 2012

CLEVER: clique-enumerating variant finder

نویسندگان

چکیده

منابع مشابه

MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels

Large Maximal Cliques Enumeration in Large Sparse Graphs

A Fast Parallel Maximum Clique Algorithm for Large Sparse Graphs and Temporal Strong Components

What if CLIQUE were fast? Maximum Cliques in Information Networks and Strong Components in Temporal Networks

An Efficient Algorithm for Enumerating Pseudo Cliques

عنوان ژورنال:

اشتراک گذاری